Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Compilation optimizations for inconsistent control flow on deep computer unit
Xiaoyi YANG, Rongcai ZHAO, Hongsheng WANG, Lin HAN, Kunkun XU
Journal of Computer Applications    2023, 43 (10): 3170-3177.   DOI: 10.11772/j.issn.1001-9081.2022091338
Abstract182)   HTML10)    PDF (4315KB)(80)       Save

The domestic DCU (Deep Computer Unit) adopts the parallel execution model of Single Instruction Multiple Thread (SIMT). When the programs are executed, inconsistent control flow is generated in the kernel function, which causes the threads in the warp be executed serially. And that is warp divergence. Aiming at the problem that the performance of the kernel function is severely restricted by warp divergence, a compilation optimization method to reduce the warp divergence time — Partial-Control-Flow-Merging (PCFM) was proposed. Firstly, divergence analysis was performed to find the fusible divergent regions that are isomorphic and contained a large number of same instructions and similar instructions. Then, the fusion profit of the fusible divergent regions was evaluated by counting the percentage of instruction cycles saved after merging. Finally, the alignment sequence was searched, the profitable fusible divergent regions were merged. Some test cases from Graphics Processing Unit (GPU) benchmark suite Rodinia and the classic sorting algorithm were selected to test PCFM on DCU. Experimental results show that PCFM can achieve an average speedup ratio of 1.146 for the test cases. And the speedup of PCFM is increased by 5.72% compared to that of the branch fusion + tail merging method. It can be seen that the proposed method has a better effect on reducing warp divergence.

Table and Figures | Reference | Related Articles | Metrics